161 results found.
Written
Corpus,
Language Type:
Multilingual
Languages:
English Finnish French German Russian Swedish
Availability:
Freely Available
License:
CC - BY - NC
Size:
2 GByte Production Status:
Existing-used
Use:
Textual Entailment and Paraphrasing
-
Paper title:Comparative Study of Sentence Embeddings for Contextual Paraphrasing
-
Paper track:Evaluation/poster presentation
-
Paper status:Accept Poster
| Author Number | Name | Affiliation | Country |
|---|---|---|---|
| Main Contact | Louisa Pragst | Opusparcus | /N |
Documentation:
None
Written
,
Language Type:
Monolingual
Languages:
Russian
Availability:
Freely Available
License:
MIT
Size:
20 GByte Production Status:
Newly created-finished
Use:
Text generation, text mining, named entity recognition, question answering, knowledge discovery
-
Paper title:Humans Keep It One Hundred: an Overview of AI Journey
-
Paper track:Evaluation/oral presentation
-
Paper status:Accept Oral
| Author Number | Name | Affiliation | Country |
|---|---|---|---|
| Main Contact | Tatiana Shavrina | AI Journey Exam Solution | /N |
Documentation:
English and Russian
Written
Treebank,
Language Type:
Monolingual
Languages:
Afrikaans Akkadian Amharic Ancient Greek Arabic Armenian Assyrian Bambara Basque Belarusian Bhojpuri Breton Bulgarian Buryat Cantonese Catalan Chinese Classical Chinese Coptic Croatian Czech Danish Dutch English Erzya Estonian Faroese Finnish French Galician German Gothic Greek Hebrew Hindi Hindi English Hungarian Indonesian Irish Italian Japanese Karelian Kazakh Komi Permyak Komi Zyrian Korean Kurmanji Latin Latvian Lithuanian Livvi Maltese Marathi Mbya Guarani Moksha Naija North Sami Norwegian Old Church Slavonic Old French Old Russian Persian Polish Portuguese Romanian Russian Sanskrit Scottish Gaelic Serbian Skolt Sami Slovak Slovenian Spanish Swedish Swedish Sign Language Swiss German Tagalog Tamil Telugu Thai Turkish Ukrainian Upper Sorbian Urdu Uyghur Vietnamese Warlpiri Welsh Wolof Yoruba
Availability:
Freely Available
License:
Various
Size:
25 million words Production Status:
Existing-updated
Use:
Parsing and Tagging
-
Paper title:Universal Dependencies v2: An Evergrowing Multilingual Treebank Collection
-
Paper track:Written/oral presentation
-
Paper status:Accept Oral
| Author Number | Name | Affiliation | Country |
|---|---|---|---|
| Main Contact | Joakim Nivre | Universal Dependencies | /N |
Documentation:
https://universaldependencies.org
Written
Corpus,
Language Type:
Monolingual
Languages:
Arabic Chinese Czech English Finnish French German Hindi Indonesian Italian Japanese Korean Polish Portuguese Russian Spanish Swedish Thai Turkish
Availability:
Freely Available
License:
CC-BY-SA
Size:
300 KByte Production Status:
Newly created-finished
Use:
Emotion Recognition/Generation
-
Paper title:How Universal are Universal Dependencies? Exploiting Syntax for Multilingual Clause-level Sentiment Detection
-
Paper track:Written/poster presentation
-
Paper status:Accept Poster
| Author Number | Name | Affiliation | Country |
|---|---|---|---|
| Main Contact | Hiroshi Kanayama | Parallel Sentiment | /N |
Documentation:
For 19 languages (ar,cs,de,en,es,fi,fr,hi,id,it,ja,ko,pl,pt,ru,sv,th,tr,zh)
Written
Evaluation Data,
Language Type:
Multilingual
Languages:
Croatian English Estonian Finnish Latvian Lithuanian Russian Slovenian Swedish
Availability:
Freely Available
License:
CC-BY-SA
Size:
1446954 entries Production Status:
Newly created-finished
Use:
Evaluation/Validation
-
Paper title:Multilingual Culture-Independent Word Analogy Datasets
-
Paper track:Evaluation/poster presentation
-
Paper status:Accept Poster
| Author Number | Name | Affiliation | Country |
|---|---|---|---|
| Main Contact | Matej Ulčar | Multilingual Culture-Independent Word Analogy Datasets | /N |
Documentation:
None
Written
Corpus,
Language Type:
Multilingual
Languages:
Arabic Bulgarian Catalan Croatian Czech Danish Dutch English Estonian Filipino Finnish French German Greek Hebrew Hindi Hungarian Indonesian Italian Japanese Korean Latvian Lithuanian Malay Norwegian Persian Polish Portuguese Romanian Russian Serbian Simplified Chinese Slovak Slovenian Spanish Swedish Thai Traditional Chinese Turkish Ukrainian Vietnamese
Availability:
Freely Available
License:
CC-BY-SA
Size:
60 GByte Production Status:
Newly created-in progress
Use:
Language Modelling
-
Paper title:Wiki-40B: Multilingual Language Model Dataset
-
Paper track:Written/oral presentation
-
Paper status:Accept Oral
| Author Number | Name | Affiliation | Country |
|---|---|---|---|
| Main Contact | Rami Al-Rfou | Wiki40B-LM | /N |
Documentation:
None
Written
Corpus,
Language Type:
Monolingual
Languages:
Afrikaans Albanian Arabic Armenian Bangla Basque Bosnian Breton Bulgarian Catalan Croatian Czech Danish Dutch English Esperanto Estonian Filipino Finnish French Galician Georgian German Greek Hebrew Hindi Hungarian Icelandic Indonesian Italian Japanese Kazakh Korean Latvian Lithuanian Macedonian Malay Malayalam Norwegian Persian Polish Portuguese Romanian Russian Serbian Sinhala Slovak Slovenian Spanish Swedish Tamil Telugu Thai Turkish Ukrainian Urdu Vietnamese pt_br ze_en ze_zh zh_cn zh_tw
Availability:
Freely Available
License:
<Not Specified>
Size:
22.10G tokens Production Status:
Existing-used
Use:
Machine Translation, SpeechToSpeech Translation
-
Paper title:word2word: A Collection of Bilingual Lexicons for 3,564 Language Pairs
-
Paper track:Written/oral presentation
-
Paper status:Accept Poster
| Author Number | Name | Affiliation | Country |
|---|---|---|---|
| Main Contact | Yo Joong Choe | OpenSubtitles2018 | /N |
Documentation:
Yes, on the website.
Written
Lexicon,
Language Type:
Monolingual
Languages:
Afrikaans Albanian Arabic Armenian Bangla Basque Bosnian Breton Bulgarian Catalan Croatian Czech Danish Dutch English Esperanto Estonian Filipino Finnish French Galician Georgian German Greek Hebrew Hindi Hungarian Icelandic Indonesian Italian Japanese Kazakh Korean Latvian Lithuanian Macedonian Malay Malayalam Norwegian Persian Polish Portuguese Romanian Russian Serbian Sinhala Slovak Slovenian Spanish Swedish Tamil Telugu Thai Turkish Ukrainian Urdu Vietnamese pt_br ze_en ze_zh zh_cn zh_tw
Availability:
Freely Available
License:
CreativeCommons Attribution 4.0 International
Size:
41 GByte Production Status:
Newly created-finished
Use:
Machine Translation, SpeechToSpeech Translation
-
Paper title:word2word: A Collection of Bilingual Lexicons for 3,564 Language Pairs
-
Paper track:Written/oral presentation
-
Paper status:Accept Poster
| Author Number | Name | Affiliation | Country |
|---|---|---|---|
| Main Contact | Yo Joong Choe | word2word | /N |
Documentation:
Yes, on the website.
Speech/Written
Corpus,
Language Type:
Multilingual
Languages:
Dolgan English Russian
Availability:
Freely Available
License:
https://creativecommons.org/licenses/by-nc-sa/4.0/legalcode
Size:
None Production Status:
Newly created-finished
Use:
Corpus Creation/Annotation
-
Paper title:Processing Language Resources of Under-Resourced and Endangered Languages for the Generation of Augmentative Alternative Communication Boards
-
Paper track:Speech/oral presentation
-
Paper status:Accept Poster+DemoSuggested
| Author Number | Name | Affiliation | Country |
|---|---|---|---|
| Main Contact | Anne Ferger | INEL Dolgan Corpus 1.0 | /N |
Documentation:
https://corpora.uni-hamburg.de/hzsk/de/islandora/object/file:dolgan-1.0_INEL_Dolgan_Corpus_1.0_User_Documentation/datastream/PDF/INEL_Dolgan_Corpus.pdf
Written
Corpus,
Language Type:
Bilingual
Languages:
English Russian
Availability:
Freely Available
License:
CreativeCommons
Size:
2.3 million tokens Production Status:
Existing-updated
Use:
Document Classification, Text categorisation
-
Paper title:Lexicogrammatic translationese across two targets and competence levels
-
Paper track:Written/poster presentation
-
Paper status:Accept Poster
| Author Number | Name | Affiliation | Country |
|---|---|---|---|
| Main Contact | Ekaterina Lapshinova-Koltunski | RusLTC | /N |
Documentation:
https://www.rus-ltc.org/static/html/about.html (in English and Russian)




